\[~\]
\[~\]
\[~\]
Nowadays, with the improvement of the technologies there are an amount of data that allows us to understand if there is any problem or not in the healthy of a person.
Here in this project we explain and we try to predict whether a patient should be diagnosed with Heart Disease or not.
\[~\]
\[~\]
\[~\]
Kaggle, is the main platform where I found this interesting dataset where applying our main bayesian inference as the main scope of this project.
This dataset is composed by these features:
##
## -- Column specification --------------------------------------------------------
## cols(
## age = col_double(),
## sex = col_double(),
## cp = col_double(),
## trtbps = col_double(),
## chol = col_double(),
## fbs = col_double(),
## restecg = col_double(),
## thalachh = col_double(),
## exng = col_double(),
## oldpeak = col_double(),
## slp = col_double(),
## caa = col_double(),
## thall = col_double(),
## output = col_double()
## )
## # A tibble: 6 x 14
## age sex cp trtbps chol fbs restecg thalachh exng oldpeak slp
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 63 1 3 145 233 1 0 150 0 2.3 0
## 2 37 1 2 130 250 0 1 187 0 3.5 0
## 3 41 0 1 130 204 0 0 172 0 1.4 2
## 4 56 1 1 120 236 0 1 178 0 0.8 2
## 5 57 0 0 120 354 0 1 163 1 0.6 2
## 6 57 1 0 140 192 0 1 148 0 0.4 1
## # ... with 3 more variables: caa <dbl>, thall <dbl>, output <dbl>
\[~\]
Analyzing these features, I recognized that the:
\[~\]
The main features detected are:
\[~\]
## age sex cp trtbps
## Min. :29.00 Min. :0.0000 Min. :0.0000 Min. : 94.0
## 1st Qu.:48.00 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:120.0
## Median :55.50 Median :1.0000 Median :1.0000 Median :130.0
## Mean :54.42 Mean :0.6821 Mean :0.9636 Mean :131.6
## 3rd Qu.:61.00 3rd Qu.:1.0000 3rd Qu.:2.0000 3rd Qu.:140.0
## Max. :77.00 Max. :1.0000 Max. :3.0000 Max. :200.0
## chol fbs restecg thalachh
## Min. :126.0 Min. :0.000 Min. :0.0000 Min. : 71.0
## 1st Qu.:211.0 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:133.2
## Median :240.5 Median :0.000 Median :1.0000 Median :152.5
## Mean :246.5 Mean :0.149 Mean :0.5265 Mean :149.6
## 3rd Qu.:274.8 3rd Qu.:0.000 3rd Qu.:1.0000 3rd Qu.:166.0
## Max. :564.0 Max. :1.000 Max. :2.0000 Max. :202.0
## exng oldpeak slp caa
## Min. :0.0000 Min. :0.000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:0.0000
## Median :0.0000 Median :0.800 Median :1.000 Median :0.0000
## Mean :0.3278 Mean :1.043 Mean :1.397 Mean :0.7185
## 3rd Qu.:1.0000 3rd Qu.:1.600 3rd Qu.:2.000 3rd Qu.:1.0000
## Max. :1.0000 Max. :6.200 Max. :2.000 Max. :4.0000
## thall output
## Min. :0.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:0.000
## Median :2.000 Median :1.000
## Mean :2.315 Mean :0.543
## 3rd Qu.:3.000 3rd Qu.:1.000
## Max. :3.000 Max. :1.000
\[~\]
\[~\]
In this subsection we want to describe graphically the dataset that we are analyzing to have a first taste of what is the better model to use in this case, we want to show below some interesting plots
\[~\]
\[~\]
PCA is a particular tool that allows us to show the behaviour of the patients (treated as points in this space) of our data. We see in this way which are the patients more similar to each other
\[~\]
\[~\]
Its normal to denote that there could be some losses on this representation due to the amount of features that we want to consider, but here we actually capture 34.9% of variance in the entire dataset using two principal components.
\[~\]
\[~\]
Here, we want to analyze the percentages of each categorical data:
\[~\]
The most relevant is the typical angina (47.4%) that is defined as substernal chest pain precipitated by physical exertion or emotional stress and relieved with rest or nitroglycerin (represents inadequate flow of blood and oxygen to the heart). Women and elderly patients are usually have atypical symptoms both at rest and during stress, often in the setting of nonobstructive coronary artery disease (CAD).
\[~\]
the ST/heart rate slope (ST/HR slope), has been proposed as a more accurate ECG criterion for diagnosing significant coronary artery disease (CAD). The most relevant rate slope is approximately between the first and second type (as flat and as down sloping).
\[~\]
The most case is about 0 major vessels that aren’t affect of any damaged, but in other little case we see that there are a bunch of major vessels that have a problem, within the heart.
\[~\]
the maximum rate achieved is about a fixed defect.
\[~\]
Angina may feel like pressure in the chest, jaw or arm. It frequently may occur with exercise or stress. Some people with angina also report feeling lightheaded, overly tired, short of breath or nauseated. As the heart pumps harder to keep up with what you are doing, it needs more oxygen-rich blood. This reflects to the fact that we have no exercise induced by angina.
\[~\]
\[~\]
\[~\]
Here, we illustrate the main features of the quantitative data, after have seen the categorical data in the previous section:
\[~\]
hist.and.summary('age', 'Persons Age')
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.00 48.00 55.50 54.42 61.00 77.00
hist.and.summary('chol', 'Cholestoral')
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 126.0 211.0 240.5 246.5 274.8 564.0
hist.and.summary('thalachh', 'Maximum Heart Rate Achieved')
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 71.0 133.2 152.5 149.6 166.0 202.0
hist.and.summary('trtbps', 'Resting Blood Pressure')
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 94.0 120.0 130.0 131.6 140.0 200.0
hist.and.summary('oldpeak', 'Previous Peak')
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.800 1.043 1.600 6.200
\[~\]
After this, we consider also the densities considering the case of having the heart attack and not:
\[~\]
dense.chart('age', 'Persons Age')
dense.chart('chol', 'Cholestoral')
dense.chart('thalachh', 'Maximum Heart Rate Achieved')
dense.chart('trtbps', 'Resting Blood Pressure')
dense.chart('oldpeak', 'Previous Peak')
\[~\]
Here, we want to highlight the correlations between the features treated in qualitative, quantitative and without any distinction.
\[~\]
\[~\]
The main goal is to leverage the main fully Bayesian analysis, based on understanding if a person has a heart attack or not.
So, the response variable \(Y_i\) (that is the “output” feature in our dataset) is the heart attack \(\in \{0,1\}\) and the predictor variables (\(x_i \in \mathbb{R}^{+}\)) have been chosen using the glm() function (we are cheating, but in this way we can sure that we are catching the right variables), used to fit generalized linear models, as we can see below:
\[~\]
##
## Call:
## glm(formula = y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 +
## x10 + x11 + x12 + x13, family = binomial(), data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5751 -0.3868 0.1514 0.5841 2.6239
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.304169 2.577891 1.282 0.199936
## x1 -0.001469 0.023464 -0.063 0.950062
## x2 -1.750930 0.468199 -3.740 0.000184 ***
## x3 0.847283 0.185548 4.566 4.96e-06 ***
## x4 -0.020188 0.010386 -1.944 0.051916 .
## x5 -0.004489 0.003806 -1.179 0.238252
## x6 0.073463 0.532452 0.138 0.890263
## x7 0.450607 0.348505 1.293 0.196021
## x8 0.023134 0.010449 2.214 0.026835 *
## x9 -0.981017 0.409806 -2.394 0.016672 *
## x10 -0.523604 0.214467 -2.441 0.014630 *
## x11 0.589074 0.349864 1.684 0.092236 .
## x12 -0.826015 0.201922 -4.091 4.30e-05 ***
## x13 -0.887203 0.290730 -3.052 0.002276 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 416.42 on 301 degrees of freedom
## Residual deviance: 210.35 on 288 degrees of freedom
## AIC: 238.35
##
## Number of Fisher Scoring iterations: 6
\[~\]
As we can see above, the glm rejects the hypothesis to consider the variables:
… and also the intercept isn’t a good choice to admit it in our model.
then, in summary we have N = 302 observations. Then we decided to consider two models and see the difference on which model at the end is the best model of our analysis.
\[~\]
\[~\]
Likelihood \(\pi(y_{obs}|\theta)\): measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters. It is formed from the joint probability distribution of the sample, but viewed and used as a function of the parameters only, thus treating the random variables as fixed at the observed values.
Logistic regression model: Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
logit link function: it uses the Cumulative Distribution Function (CDF) of the logistic distribution. A benefit to using Logit is that the coefficients can be interpreted in the terms of odds ratios.
cloglog function: Is unlike Logit and Probit asymmetric. It is best used when the probability of an event is very small or very large. The complementary log-log approaches 0 infinitely slower than any other link function. Cloglog model is closely related to continuous-time models for the occurrence of events, so it has an important application in the area of survival analysis and hazard modeling.
Prior \(\pi(\theta)\): is the probability distribution that would express one’s beliefs about this quantity before some evidence is taken into account.
the deviance: is a goodness-of-fit statistic for a statistical model; it is often used for statistical hypothesis testing. It is a generalization of the idea of using the sum of squares of residuals (RSS) in ordinary least squares to cases where model-fitting is achieved by maximum likelihood.
\[~\]
\[~\]
In the first model, we decide to consider the following logistic regression model with the logit link function:
\[ Y_i \sim Bern(logit(p_i)) \\ logit(p_i) = \beta_{2} \cdot x_{2_i} + \beta_{3} \cdot x_{3_i} + \beta_{4} \cdot x_{4_i} +\beta_{8} \cdot x_{8_i} + \beta_{9} \cdot x_{9_i} + \beta_{10} \cdot x_{10_i} + \beta_{11} \cdot x_{11_i} + \beta_{12} \cdot x_{12_i} + \beta_{13} \cdot x_{13_i} \] The prior beta parameters are chosen approximatly considering the estimated averages saw in the previous subsection regarding the outcomes of glm(). So, the beta prior parameters are distributed following a normal distribution as follows:
\[ \beta_2 \sim N(\mu_2 = -1, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_3 \sim N(\mu_3 = 0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_4 \sim N(\mu_4 = 0, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_8 \sim N(\mu_8 = 0, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_9 \sim N(\mu_9 = -0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_{10} \sim N(\mu_{10} = 0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_{11} \sim N(\mu_{11} = 0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_{12} \sim N(\mu_{12} = -0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \beta_{13} \sim N(\mu_{13} = -0.5, \tau^{2} = 1.0 \cdot 10^{-6}) \\ \]
The main reason is that we prefer chosing the binary classification as in this case the logistic regression to model our outcomes based only on 0 or 1.
\[~\]
\[~\]
As we can see, we implemented the model using RJags:
\[~\]
\[~\]
model <- function(){
# Likelihood
for (i in 1:N){
y[i] ~ dbern(p[i])
logit(p[i]) <- beta2*x2[i] + beta3*x3[i] + beta4*x4[i] + beta8*x8[i] + beta9*x9[i] + beta10*x10[i] + beta11*x11[i] + beta12*x12[i] + beta13*x13[i] # omit intercept and not useful variables
}
# Defining the prior beta parameters
beta2 ~ dnorm(-1, 1.0E-6)
beta3 ~ dnorm(0.5, 1.0E-6)
beta4 ~ dnorm(0, 1.0E-6)
beta8 ~ dnorm(0, 1.0E-6)
beta9 ~ dnorm(-0.5, 1.0E-6)
beta10 ~ dnorm(0.5, 1.0E-6)
beta11 ~ dnorm(0.5, 1.0E-6)
beta12 ~ dnorm(-0.5, 1.0E-6)
beta13 ~ dnorm(-0.5, 1.0E-6)
}
remark: that the standard deviation in RJags is considered as the precision, so the \(\tau^2 = \frac{1}{\sigma^{2}}\)
\[~\]
# Passing the data for RJags
data.jags <- list("y" = y, "N" = N,
"x2" = x2, "x3" = x3, "x4" = x4, "x8" = x8, "x9" = x9, "x10" = x10, "x11" = x11, "x12" = x12, "x13" = x13)
# Defining parameters of interest to show after running RJags
mod.params <- c("beta2", "beta3", "beta4", "beta8", "beta9", "beta10", "beta11", "beta12", "beta13")
\[~\]
# Run JAGS
set.seed(123)
n.chains <- 3
mod.fit <- jags(data = data.jags, # DATA
model.file = model, # MODEL
parameters.to.save = mod.params, # TRACKING
n.chains = n.chains, n.iter = 10000, n.burnin = 1000, n.thin=10) # MCMC
## module glm loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 302
## Unobserved stochastic nodes: 9
## Total graph size: 3840
##
## Initializing model
mod.fit
## Inference for Bugs model at "C:/Users/Jeremy/AppData/Local/Temp/RtmpknYfOZ/model563036061b32.txt", fit using jags,
## 3 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
## n.sims = 2700 iterations saved
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## beta10 -0.489 0.212 -0.909 -0.628 -0.490 -0.344 -0.081 1.001 2700
## beta11 0.683 0.344 0.021 0.440 0.687 0.909 1.369 1.001 2700
## beta12 -0.846 0.199 -1.252 -0.977 -0.843 -0.711 -0.471 1.002 1900
## beta13 -0.851 0.277 -1.401 -1.038 -0.842 -0.659 -0.332 1.001 2700
## beta2 -1.652 0.445 -2.530 -1.955 -1.640 -1.356 -0.810 1.001 2700
## beta3 0.881 0.187 0.524 0.756 0.879 1.006 1.253 1.002 2700
## beta4 -0.015 0.008 -0.032 -0.020 -0.015 -0.009 0.001 1.002 1700
## beta8 0.031 0.008 0.016 0.026 0.031 0.037 0.047 1.002 1600
## beta9 -0.889 0.414 -1.688 -1.173 -0.895 -0.605 -0.069 1.001 2700
## deviance 225.361 4.463 218.784 222.067 224.697 227.789 235.802 1.001 2700
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 10.0 and DIC = 235.3
## DIC is an estimate of expected predictive error (lower deviance is better).
\[~\]
In this last running of the MCMC we have interesting results:
mu.vect and sd.vect: these values representing the estimated values after simulating the MCMC.
the percentages (quantiles): represents the quantiles of the interval where the parameter is likely in
RHat: The values near 1 suggest convergence
n.eff: is the effective number of samples to achieve the convergence, the stationary region.
just to have a look, let’s check how the fitting curve look like if we use the obtained estimates:
\[~\]
\[~\]
Here, we show the univariate trace-plots of the simulations of each parameter:
\[~\]
\[~\] In this section we want to see the empirical average of \(\mathbf{\hat{I}}_{t}\) increasing the value of \(t \, = \, 1,...,T\). Before starting, we want to write the formula of \(\mathbf{\hat{I}}_{t}\) that is:
\[~\]
\[ \mathbf{I} \approx \mathbf{\hat{I}}_{t} = \frac{1}{T} \sum_{i=1}^{T} h(\theta^{(i)}) \]
\[~\]
is important to write \(\approx\) because this leverages the SLLN theorem and we want also to confirm this assumption! So, let’s see the results:
\[~\]
\[~\]
\[~\]
We are considering other interesting plots, underlining also each chain alone:
\[~\] Now, we want to see the autocorrelations of each parameter:
## beta10 beta11 beta12 beta13 beta2
## Lag 0 1.000000000 1.000000000 1.000000000 1.000000000 1.000000000
## Lag 10 0.028749715 0.013027320 0.011401177 0.008061257 0.009605275
## Lag 50 0.004853036 0.007741027 0.006044363 -0.018532707 0.032820945
## Lag 100 -0.062010722 -0.017318088 0.005858652 0.022871691 0.044325748
## Lag 500 0.022963511 0.034443073 -0.006998113 0.004928576 0.002334635
## beta3 beta4 beta8 beta9 deviance
## Lag 0 1.00000000 1.000000000 1.00000000 1.000000000 1.000000e+00
## Lag 10 0.02486498 0.001619878 0.04591646 -0.004177052 -1.524460e-02
## Lag 50 0.02580283 -0.048039671 -0.04290369 -0.023810367 -9.722008e-03
## Lag 100 0.02351930 -0.011874447 -0.01503457 -0.028663890 -9.939696e-05
## Lag 500 0.03144189 0.021556527 0.01383545 0.018783666 1.420195e-02
\[~\] The ACFs follow good behaviours, because we need to have samples independent each other during new iterations. Here, we see that the correlations are going further from the first lag, this is strictly decreasing. So, this is a good point!
We can conferm our intuition also with the values returned by autocorr.diag() that it returned the vector of the average autocorrelations across all chains.
\[~\]
\[~\] Here, we want to analyze the best approximation error for the MCMC sampling. We consider, essentially the square root of the MCSE.
The variance formula in the MCMC samplig is:
\[ \mathbf{V}[\hat{I}_{t}] = \frac{Var_{\pi}[h(X_{1})]}{t_{eff}} = \Big( 1 + 2 \sum_{k=1}^{\infty} \rho_{k}\Big)\frac{\sigma^{2}}{T} \] where \(t_{eff}\) is the effective sample size (ESS). The idea is to have a sort of “exchange rate” between dependent and independent samples.
We can say, if we have 1,000 samples from a certain Markov chain are worth about as much as 70 independent samples because the MCMC samples are highly correlated.
Or if you have 1,000 samples from a different Markov chain are worth about as much as 400 independent samples because although the MCMC samples are dependent, they’re weakly correlated.
…More formal details on the slides.
As we can see the model 1 has these effective samples size for each parameters involved in this analysis:
## beta10 beta11 beta12 beta13 beta2 beta3 beta4 beta8
## 2700 2700 1900 2700 2700 2700 1700 1600
## beta9 deviance
## 2700 2700
means that our samples, respect 10000 iterations, are weakly correlated.
…now let’s move on to calculate the MCSE
in this case we want to consider the MCSE that is the square root of the formula written above, So the results are (for each chain):
n <- length(colnames(mod.fit$BUGSoutput$sims.matrix))
mcse_dataframe <- data.frame(MCSE_jointly = rep(NA, n))
rownames(mcse_dataframe) <- colnames(mod.fit$BUGSoutput$sims.matrix)[1:n]
for(colname in colnames(mod.fit$BUGSoutput$sims.matrix)[1:n]){
mcse_dataframe[colname, "MCSE_jointly"] <- LaplacesDemon::MCSE(mod.fit$BUGSoutput$sims.matrix[ , colname])
}
mcse_dataframe
## MCSE_jointly
## beta10 0.0039161885
## beta11 0.0068608122
## beta12 0.0038219783
## beta13 0.0053805117
## beta2 0.0084766703
## beta3 0.0035720520
## beta4 0.0001623317
## beta8 0.0001547387
## beta9 0.0082876402
## deviance 0.0920766126
\[~\] As we can see the \(\beta_2\) has the highest uncertainty considering the jointly chains.
\[~\]
\[~\] The uncertainty is measured by using the variability of the parameter w.r.t. its absolute expectation:
## mean sd variability
## beta10 -0.48896601 0.211591852 0.43273325
## beta11 0.68316497 0.343508862 0.50281978
## beta12 -0.84554789 0.198878026 0.23520611
## beta13 -0.85142734 0.277368178 0.32576846
## beta2 -1.65152293 0.444823897 0.26934164
## beta3 0.88106225 0.186628248 0.21182186
## beta4 -0.01462894 0.008258434 0.56452715
## beta8 0.03139029 0.007968318 0.25384658
## beta9 -0.88885032 0.414351227 0.46616536
## deviance 225.36094747 4.463078811 0.01980414
The highest posterior uncertainty is about the \(\beta_4\).
\[~\]
\[~\] Here, we want to understand if we achieve with these multiple simulations of the markov chains the convergences. We are considering one of the tools saw in the last lectures, to understand if we achieve or not the convergency this is more powerful than the other tools.
\[~\]
\[~\] Implements a convergence diagnostic, and removes up to half the chain in order to ensure that the means are estimated from a chain that has converged.
The convergence test uses the Cramer-von-Mises statistic to test the null hypothesis that the sampled values come from a stationary distribution.
The test is successively applied, firstly to the whole chain, then after discarding the first 10%, 20%, … of the chain until either the null hypothesis is accepted, or 50% of the chain has been discarded.
The latter outcome constitutes `failure’ of the stationarity test and indicates that a longer MCMC run is needed. If the stationarity test is passed, the number of iterations to keep and the number to discard are reported.
The half-width test calculates a 95% confidence interval for the mean, using the portion of the chain which passed the stationarity test.
Half the width of this interval is compared with the estimate of the mean. If the ratio between the half-width and the mean is lower than eps, the halfwidth test is passed. Otherwise the length of the sample is deemed not long enough to estimate the mean with sufficient accuracy.
\[~\]
## [[1]]
##
## Stationarity start p-value
## test iteration
## beta10 passed 1 0.3239
## beta11 passed 1 0.6880
## beta12 passed 1 0.2287
## beta13 passed 1 0.7323
## beta2 passed 1 0.7564
## beta3 passed 1 0.7176
## beta4 passed 1 0.0622
## beta8 passed 1 0.2670
## beta9 passed 1 0.6451
## deviance passed 1 0.1120
##
## Halfwidth Mean Halfwidth
## test
## beta10 passed -0.4835 0.014111
## beta11 passed 0.6757 0.020904
## beta12 passed -0.8545 0.012844
## beta13 passed -0.8541 0.017958
## beta2 passed -1.6552 0.029211
## beta3 passed 0.8795 0.012322
## beta4 passed -0.0147 0.000559
## beta8 passed 0.0316 0.000491
## beta9 passed -0.8927 0.025430
## deviance passed 225.2703 0.297930
##
## [[2]]
##
## Stationarity start p-value
## test iteration
## beta10 passed 1 0.233
## beta11 passed 1 0.254
## beta12 passed 1 0.855
## beta13 passed 1 0.784
## beta2 passed 1 0.699
## beta3 passed 1 0.153
## beta4 passed 1 0.642
## beta8 passed 1 0.812
## beta9 passed 1 0.423
## deviance passed 1 0.769
##
## Halfwidth Mean Halfwidth
## test
## beta10 passed -0.4928 0.013672
## beta11 passed 0.6839 0.022893
## beta12 passed -0.8391 0.013463
## beta13 passed -0.8527 0.016625
## beta2 passed -1.6389 0.028610
## beta3 passed 0.8842 0.011386
## beta4 passed -0.0149 0.000536
## beta8 passed 0.0315 0.000566
## beta9 passed -0.8891 0.027043
## deviance passed 225.3264 0.268161
##
## [[3]]
##
## Stationarity start p-value
## test iteration
## beta10 passed 1 0.3448
## beta11 passed 1 0.1388
## beta12 passed 1 0.6863
## beta13 passed 1 0.1368
## beta2 passed 1 0.0714
## beta3 passed 1 0.3191
## beta4 passed 1 0.5506
## beta8 passed 1 0.8238
## beta9 passed 1 0.8632
## deviance passed 1 0.5795
##
## Halfwidth Mean Halfwidth
## test
## beta10 passed -0.4906 0.013692
## beta11 passed 0.6899 0.022293
## beta12 passed -0.8431 0.012653
## beta13 passed -0.8475 0.018361
## beta2 passed -1.6604 0.031817
## beta3 passed 0.8795 0.014677
## beta4 passed -0.0142 0.000546
## beta8 passed 0.0310 0.000524
## beta9 passed -0.8847 0.031369
## deviance passed 225.4861 0.294261
\[~\] We prefer considering the Heidel test because it is a double check of the stationarity and the convergency states. This is sometimes failed, meanwhile the other tools seems to be good, so I prefer considering this test.
Here, we can see all is considered correctly, so our intuitions are correct!
\[~\]
\[~\]
Here, we want to illustrate how using the markov chain to approximate the posterior predictive distribution of the people that could have the heart attack or not considering the features explained above. So, we want to approximate this area:
\[ m(y_{new}) = \int_{\Theta} f(y_{new}|\theta, Y_{old}, x_2, x_3, x_4, x_8, x_9, x_{10}, x_{11}, x_{12}, x_{13})\pi(\theta)d\theta \]
where the \(\theta\) is equal to the coefficients used and considered before, \(\Theta\) is the parameter space and \(\pi(\theta)\) is the product of the priors thanks to marginal independence assumption. At the end, we want to smaple from the \(\pi(\theta)\) and from \(f(y_{new}|\theta, Y_{old}, x_2, x_3, x_4, x_8, x_9, x_{10}, x_{11}, x_{12}, x_{13})\).
Here, there is a brief practical example:
new_record <- list(x2 = sample(x2, 1, replace = T), x3 = sample(x3, 1, replace = T), x4 = sample(x4, 1, replace = T), x8 = sample(x8, 1, replace = T), x9 = sample(x9, 1, replace = T),
x10 = sample(x10, 1, replace = T), x11 = sample(x11, 1, replace = T), x12 = sample(x12, 1, replace = T), x13 = sample(x13, 1, replace = T))
x <- mod.fit$BUGSoutput$sims.matrix[,"beta2"]*new_record$x2 + mod.fit$BUGSoutput$sims.matrix[,"beta3"]*new_record$x3 + mod.fit$BUGSoutput$sims.matrix[,"beta4"]*new_record$x4 + mod.fit$BUGSoutput$sims.matrix[,"beta8"]*new_record$x8 + mod.fit$BUGSoutput$sims.matrix[,"beta9"]*new_record$x9 + mod.fit$BUGSoutput$sims.matrix[,"beta10"]*new_record$x10 + mod.fit$BUGSoutput$sims.matrix[,"beta11"]*new_record$x11 + mod.fit$BUGSoutput$sims.matrix[,"beta12"]*new_record$x12 + mod.fit$BUGSoutput$sims.matrix[,"beta13"]*new_record$x13
x_mu <- 1/(1+exp(-x))
y_pred <- unlist(lapply(x_mu, function(x) rbinom(n = 1, size = 1, prob = x)))
sd(y_pred)
## [1] 0.3478183
prop.table(table(y_pred))
## y_pred
## 0 1
## 0.8592593 0.1407407
prop.table(table(y))
## y
## 0 1
## 0.4569536 0.5430464
summary(y_pred)[4]
## Mean
## 0.1407407
hchart(y_pred, type = "column", name = "model1", color = randomcoloR::randomColor()) %>%
hc_title(text = "Predictions with model 1") %>%
hc_xAxis(title = "model1") %>%
hc_chart(options3d=list(enabled=TRUE, alpha=2, beta=-10,
depth=100, viewDistance=25)) %>%
hc_plotOptions(column=list(depth= 100))
After read, the whole project report we will see which is the best model here in terms of predictions, stay tuned!
\[~\]
\[~\]
Here, we bring the correlations between all the values calculated during the MCMC sampling, considering the joining matrix:
the highest correlation is between \(\beta_4 \text{ and } \beta_8\).
\[~\]
\[~\] Now, we want to focus to another model always the logistic regression with the whole features and a new link function, the cloglog function.
Why this changment? Because, we want to see which model is better… and how can you see this? I’ll tell you where you can easy see this better preference.
What is the link function? So, the link function transforms the probabilities of the levels of a categorical response variable to a continuous scale that is unbounded. Once the transformation is complete, the relationship between the predictors and the response can be modeled with the logistic regression for example.
As we can see we will reproduce this second model changing the link function to see if it is better or not than the previous model:
\[~\]
## module glm loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 302
## Unobserved stochastic nodes: 13
## Total graph size: 5250
##
## Initializing model
## Inference for Bugs model at "C:/Users/Jeremy/AppData/Local/Temp/RtmpM1kG8j/model5704361e22ff.txt", fit using jags,
## 3 chains, each with 10000 iterations (first 1000 discarded), n.thin = 10
## n.sims = 2700 iterations saved
## mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
## beta1 0.002 0.013 -0.023 -0.006 0.002 0.010 0.027 1.002 1300
## beta10 -0.261 0.132 -0.530 -0.347 -0.261 -0.171 -0.006 1.001 2700
## beta11 0.423 0.221 0.003 0.264 0.421 0.573 0.872 1.001 2700
## beta12 -0.652 0.149 -0.950 -0.750 -0.646 -0.552 -0.371 1.001 2700
## beta13 -0.483 0.182 -0.845 -0.605 -0.485 -0.361 -0.134 1.001 2700
## beta2 -1.205 0.272 -1.787 -1.381 -1.196 -1.016 -0.696 1.002 1900
## beta3 0.521 0.115 0.301 0.445 0.519 0.599 0.747 1.001 2700
## beta4 -0.010 0.006 -0.023 -0.014 -0.010 -0.006 0.002 1.003 670
## beta5 -0.004 0.002 -0.008 -0.005 -0.004 -0.002 0.001 1.011 190
## beta6 0.193 0.321 -0.441 -0.020 0.200 0.404 0.828 1.001 2700
## beta7 0.280 0.212 -0.136 0.138 0.280 0.417 0.698 1.001 2700
## beta8 0.022 0.005 0.012 0.018 0.022 0.025 0.031 1.004 610
## beta9 -0.621 0.277 -1.181 -0.798 -0.617 -0.434 -0.078 1.001 2700
## deviance 225.103 5.257 216.888 221.389 224.364 228.050 237.550 1.000 2700
##
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
##
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 13.8 and DIC = 238.9
## DIC is an estimate of expected predictive error (lower deviance is better).
\[~\]
Can you see something different? Seems yes.. see the DIC! Is better the previous model (although the difference is too tiny, but the previous model is better).
…mmh, what is the DIC?
\[~\]
\[~\] As we can see above, we tried to make a different model only changing the link function to see if it is better to associate with our data, the DIC (Deviance information criterion) is our indicator that says to us which is the better model.
The deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC). It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. DIC is an asymptotic approximation as the sample size becomes large, like AIC.
the DIC is calculated as:
\[ DIC = p_D + \overline{D(\theta)} \] where:
The larger the effective number of parameters is, the easier it is for the model to fit the data, and so the deviance needs to be penalized.
Lower is the DIC value better is the accuracy of the model, in this case is better the first model.
\[~\]
\[~\] We try to plot the logistic regression using the parameters obtained in the first model, we compare the line for each parameter-response variable:
We prefer considering to show only the significant ones.
\[~\]
\[~\]
We saw the power and the weakness of Bayesian Analysis.
At first step we tried to face with the model on jags “cheating” a little bit on which parameters are maybe good for our purposes and after that we make our usual considerations, then we compared with another model not so much good, Of curse many little things could be improved in the future in this amazing analysis that I did in my opinion.
So thanks for all!
\[~\]
\[~\]
Heart Attack Analysis & Prediction Dataset - [Kaggle] https://www.kaggle.com/rashikrahmanpritom/heart-attack-analysis-prediction-dataset
For graphical plots - [Kaggle] https://www.kaggle.com/chaitanya99/heart-attack-analysis-prediction/data
Link functions, the model types - [Alteryx] https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Selecting-a-Logistic-Regression-Model-Type-Logit-Probit-or/ta-p/111269
HighCharter, for the amazing plots in R - [High Charter] https://jkunst.com/highcharter/articles/hchart.html
Coronary artery diasease - [PubMed] https://pubmed.ncbi.nlm.nih.gov/3739881
Nitroglycerin - [Wikipedia] https://en.wikipedia.org/wiki/Nitroglycerin
Vessel Diases - [Digirad] https://www.digirad.com/triple-vessel-disease-diagnosis-treatment/
Different types of Angina - [CardioSmart] https://www.cardiosmart.org/topics/angina
Logistic Regression Bayesian Model R - [BayesBall] https://bayesball.github.io/BOOK/bayesian-multiple-regression-and-logistic-models.html
Typical Angina - [NCBI] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5680106